Constructing Su x Arrays of Large Texts

نویسنده

  • K. Sadakane
چکیده

Recently, Sadakane [12] proposes a new fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called sufx array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. In full-text databases, of course the length of texts are quite large, and this algorithm makes it possible to use the su x array data structure and the compression scheme for such larger texts. In this paper, we compare algorithms for making su x arrays of Bentley-Sedgewick, Andersson-Nilsson and Karp-Miller-Rosenberg and making su x trees of Larsson on speed and required memory and compare them with our new algorithm which is fast and memory e cient by combining them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalizations of suffix arrays to multi-dimensional matrices

We propose multi-dimensional index data structures that generalize su x arrays to square matrices and cubic matrices. Giancarlo proposed a two-dimensional index data structure, the Lsu x tree, that generalizes su x trees to square matrices. However, the construction algorithm for Lsu x trees maintains complicated data structures and uses a large amount of space. We present simple and practical ...

متن کامل

Unifying Text Search and Compression {suffix Sorting, Block Sorting and Suffix Arrays{ Title: Associate Professor of Information Science

Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many e-mail documents. Therefore, fast queries on such huge amount of documents and their compression to reduce costs for storing or transferring them are important. In this thesis, a uni ed ...

متن کامل

A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation

A new text database management method for distributed cooperative environments is proposed, which can collect texts in distributed sites through a network of narrow bandwidth and enables fulltext search in a uni ed e cient manner. This method is based on the two new developments in full-text search data structures and data compression. Speci cally, the Burrows-Wheeler transformation is used as ...

متن کامل

Engineering a Lightweight External Memory Suffix Array Construction Algorithm

We describe an external memory su x array construction algorithm based on constructing su x arrays for blocks of text and merging them into the full su x array. The basic idea goes back over 20 years and there has been a couple of later improvements, but we describe several further improvements that make the algorithm much faster. In particular, we reduce the I/O volume of the algorithm by a fa...

متن کامل

A Fast Algorithm for Making Su x Arrays and for Burrows-Wheeler Transformation

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998